Speech and Gaze Control for Desktop Environments
نویسندگان
چکیده
This chapter illustrates a multimodal system based on the integration of speechand gazebased inputs for interaction with a real desktop environment. In this system, multimodal interactions aim at overcoming the instrinsic limit of each input channel taken alone. The chapter introduces the main eye tracking and speech recognition technologies, and describes a multimodal system that integrates the two input channels by generating a real-time vocal grammar based on gaze-driven contextual information. The proposed approach shows how the combined used of auditive and visual clues actually permits to achieve mutual disambiguation in the interaction with a real desktop environment. As a result, the system enables the use of low cost audio-visual devices for every day tasks even when traditional pointing devices, such as a keyboard or a mouse, are unsuitable for use with a personal computer.
منابع مشابه
Multimodal Dialogue for Ambient Intelligence and Smart Environments
Ambient Intelligence (AmI) and Smart Environments (SmE) are based on three foundations: ubiquitous computing, ubiquitous communication and intelligent adaptive interfaces [41]. This type of systems consists of a series of interconnected computing and sensing devices which surround the user pervasively in his environment and are invisible to him, providing a service that is dynamically adapted t...
متن کاملAffordances of Input Modalities for Visual Data Exploration in Immersive Environments
There has been a consistent push towards exploring novel input, display, and feedback technologies for sensemaking from data. However, most visual analytical systems in the wild that go beyond a traditional desktop utilize commercial large displays with direct touch, since they require the least effort to adapt from the desktop/mouse setting. There is a plethora of device technologies that are ...
متن کاملSelective use of gaze information to improve ASR performance in noisy environments by cache-based class language model adaptation
Using information from a person’s gaze has potential to improve ASR performance in acoustically noisy environments. However, previous work has resulted in relatively minor improvements. A cache-based language model adaptation framework is presented where the cache contains a sequence of gaze events, classes represent visual context and task, and the relative importance of gaze events is conside...
متن کاملThe selective use of gaze in automatic speech recognition
The performance of automatic speech recognition (ASR) degrades significantly in natural environments compared to in laboratory assessments. Being a major source of interference, acoustic noise affects speech intelligibility during the ASR process. There are two main problems caused by the acoustic noise. The first is the speech signal contamination. The second is the speakers’ vocal and non-voc...
متن کاملHow do people explore virtual environments?
Understanding how humans explore virtual environments is crucial for many applications, such as developing compression algorithms or designing effective cinematic virtual reality (VR) content, as well as to develop predictive computational models. We have recorded 780 head and gaze trajectories from 86 users exploring omnidirectional stereo panoramas using VR head-mounted displays. By analyzing...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2016